Skip to content

Wfproc feature/hybrid artwork detection#57

Open
exdysa wants to merge 3 commits intomainfrom
wfproc-feature/hybrid-artwork-detection
Open

Wfproc feature/hybrid artwork detection#57
exdysa wants to merge 3 commits intomainfrom
wfproc-feature/hybrid-artwork-detection

Conversation

@exdysa
Copy link
Copy Markdown
Member

@exdysa exdysa commented Apr 6, 2026

No description provided.

wfproc and others added 3 commits March 25, 2026 19:06
Confirms that CLIP-based detection is biased toward generators that use
CLIP internally. Tested on Defactify MS-COCOAI dataset (96K images, 5
labeled generators, semantically matched captions):

  Generator      Uses CLIP?  Hand-crafted  CLIP    Delta
  SD 2.1         YES         86.5%         96.1%   +9.6pp
  SDXL           YES         93.5%         99.0%   +5.5pp
  SD 3           YES         85.4%         97.5%   +12.1pp
  Midjourney v6  Unknown     88.5%         99.5%   +11.0pp
  DALL-E 3       NO          98.7%         98.2%   -0.5pp

CLIP advantage on CLIP generators: +9.1pp average
CLIP advantage on non-CLIP generators: -0.5pp (hand-crafted wins)

Replaces per-experiment PDFs with single consolidated research report
(negate_research_report.pdf) covering all experiments, scaling analysis,
CLIP bias findings, and recommended next steps.
+ 156 handcrafted features (was 49) + 768 frozen ConvNeXt-Tiny
+ fine-tuned ConvNeXt anime veto model
+ 3-model ensemble (LightGBM + SVM + RF) with calibrated 3-class output
+ pause/resume feature extraction cache system
~ feature_artwork.py expanded with Gabor, wavelets, fractal, JPEG ghost,
  mid-band frequency, patch consistency, linework analysis
- removed dead-end test scripts and outdated results from PRs #51/#52
@exdysa
Copy link
Copy Markdown
Member Author

exdysa commented Apr 6, 2026

  • going to trim this down to be more congruent with current structure
  • ideally modular drop-in replacement for wavelet.py or residuals.py classes
  • gotta read all these papers too
  • preferred tests are unittests and mocking vs isolated runs. will elaborate in contributing.md/claude.md on what tests should do and how they should be written.
  • once structure is correct will try training models
  • new LearnedExtract class in particular follows code structure excellently, maybe last claude.md updates were useful? will ensure that agent md files thoroughly align with contributing info for this and future collaborations
  • i might need to redo cross-validation stuff in tests
  • ought to avoid downloading data or models in any tests
  • i dont know that i can confirm dalle-3 isnt using clip so i ought to read to be certain
  • fix whatevers else is failing in tests
  • probably more i can think of later...

@exdysa
Copy link
Copy Markdown
Member Author

exdysa commented Apr 12, 2026

code migrated in individual modules by necessity to #58, too many merge issues
leaving open to ensure all the code gets migrated first

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants